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Abstract. Bayes [P/iiZos. Trans, i?. Soc. Lond. 53 (1763) 370-418; 54 
296-325] introduced the observed likelihood function to statistical in- 
ference and provided a weight function to calibrate the parameter; he 
also introduced a confidence distribution on the parameter space but 
did not provide present justifications. Of course the names likelihood 
and confidence did not appear until much later: Fisher [Philos. Trans. 
R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 222 (1922) 309-368] for 
likelihood and Neyman [Philos. Trans. R. Soc. Lond. Ser. A Math. 
Phys. Eng. Sci. 237 (1937) 333-380] for confidence. Lindley [J. Roy. 
Statist. Soc. Ser. B 20 (1958) 102-107] showed that the Bayes and 
the confidence results were different when the model was not location. 
This paper examines the occurrence of true statements from the Bayes 
approach and from the confidence approach, and shows that the pro- 
portion of true statements in the Bayes case depends critically on the 
presence of linearity in the model; and with departure from this linear- 
ity the Bayes approach can be a poor approximation and be seriously 
misleading. Bayesian integration of weighted likelihood thus provides 
a first-order linear approximation to confidence, but without linearity 
can give substantially incorrect results. 

Key words and phrases: Bayes, Bayes error rate, confidence, default 
prior, evaluating a prior, nonlinear parameter, posterior, prior. 



1. INTRODUCTION 

Statistical inference based on the observed likeli- 
hood function was initiated by Bayes (1763). This 
was, however, without the naming of the likelihood 
function or the apparent recognition that likelihood 
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L^{9) = f{y^] 9) directly records the amount of prob- 
ability at an observed data point y^; such appeared 
much later (Fisher, 1922). 

Bayes' proposal applies directly to a model with 
translation invariance that in current notation would 
be written f{y — 9); it recommended that a weight 
function or mathematical prior 'Jt{9) be applied to 
the likelihood L{9), and that the product 7r{9)L{9) 
be treated as if it were a joint density for {9, y). Then 
with observed data and the use of the conditional 
probability lemma, a posterior distribution ir{9\y) = 
c7r{9)L^(9) was obtained; this was viewed as a de- 
scription of possible values for 9 in the presence of 
data y = y^. For the location model, as examined by 
the Bayes approach, translation invariance suggests 
a constant or fiat prior 7r{9) = c which leads to the 
posterior distribution ■K{9\y^) = f{y^ — 9) and, in the 
scalar case, gives the posterior survival probability 
^(^) ~ f(.y^ ~ ^) recording alleged probabil- 
ity to the right of a value 9. 
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The probability interpretation that would seem- 
ingly attach to this conditional calculation is as fol- 
lows: if the 6 values that might have been present 
in the application can be viewed as coming from the 
frequency pattern tt{6) with each 6 value in turn giv- 
ing rise to a y value in accord with the model and if 
the resulting y values that are close to are exam- 
ined, then the associated 6 values have the pattern 
7r(0|yO). 

The complication is that tt{6) as proposed is a ma- 
thematical construct and, correspondingly, 7r[9\y^) 
is just a mathematical construct. The argument us- 
ing the conditional probability lemma does not pro- 
duce probabilities from no probabilities: the prob- 
ability lemma when invoked for an application has 
two distributions as input and one distribution as 
output; and it asserts the descriptive validity of the 
output on the basis of the descriptive validity of the 
two inputs; if one of the inputs is absent and an 
artifact is substituted, then the lemma says noth- 
ing, and produces no probabilities. Of course, other 
lemmas and other theory may offer something ap- 
propriate. 

We will see, however, that something different is 
readily available and indeed available without the 
special translation invariance. We will also see that 
the procedure of augmenting likelihood L^{0) with 
a modulating factor that expresses model structure 
is a powerful first step in exploring information con- 
tained in Fisher's likelihood function. 

An alternative to the Bayes proposal was intro- 
duced by Fisher (1930) as a confidence distribution. 
For the scalar-parameter case we can record the per- 
centage position of the data point in the distri- 
bution having parameter value 9, 

p{e)=pie;y^)= / f{y-9)dy. 

This records the proportion of the 6 population that 
is less than the value y^. For a general data point y 
we have of course that p{9; y) is uniformly distributed 
on (0, 1), and, correspondingly, p{9) from the data y^ 
gives the upper-tail distribution function or survivor 
function for confidence, as introduced by Fisher 
(1935). A basic way of presenting confidence is in 
terms of quantiles. If we set p{9) = 0.95 and solve 
for ^, we obtain 9 = ^o.95 which is the value with 
right tail confidence 95% and left tail confidence 
5%; this would typically be called the 95% lower 
confidence bound, and (0o.95)Oo) would be the cor- 
responding 95% confidence interval. 



For two-sided confidence the situation has some 
subtleties that are often overlooked. With the large 
data sets that have come from the colliders of High 
Energy Physics, a Poisson count can have a mean 
at a background count level or at a larger value if 
some proposed particle is actually present. A com- 
mon practice in the High Energy Physics literature 
(Mandelkern, 2002) has been to form two-sided con- 
fidence intervals and to allow the confidence con- 
tributions in the two tails to be different, thereby 
accommodating some optimality criterion; see also 
some discussion in Section 4. In practice, this meant 
that the confidence lower bound shied away from 
the critical parameter lower bound describing just 
the background radiation. This mismanaged the de- 
tection of a new particle. Accordingly, our view is 
that two-sided intervals should typically have equal 
or certainly designated amounts of confidence in the 
two tails. With this in mind, we now restrict the dis- 
cussion to the analysis of the confidence bounds as 
described in the preceding paragraph and view con- 
fidence intervals as being properly built on individ- 
ual confidence bounds with designated confidence 
values. 

As a simple example consider the Normal (/i, cjo), 
and let (j){z) and ^{z) be the standard normal den- 
sity and distribution functions. The p- value from 
data y° is 

P{,)=r\{'-^)dy = ^(l^\ 
J-oo V y \ (^0 J 

which has normal distribution function shape drop- 
ping from 1 at —oo to at -|-oo; it records the proba- 
bility position of the data with respect to a possible 
parameter value fi; see Figure 1(a). From the con- 
fidence viewpoint, p{iJ,) is recording the right tail 
confidence distribution function, and the confidence 
distribution is Normal(y'', ctq). 

The Bayes posterior distribution for /i using the 
invariant prior has density c4>{{y^ — n)/ao}; this is 
Normal(y*^, cJq). The resulting posterior survivor fun- 
ction value is 

and its values are indicated in Figure 1 (b) ; the func- 
tion provides a probability-type evaluation of the 
right tail interval (//, oo) for the parameter. For this 
we have used the letter s to suggest the "survivor" 
aspect of the Bayes analogue of the present one- 
sided frequentist p-value. 




(b) 

Fig. 1. Nomial(^, 1) model: The density of y given jj, in (a); 
the posterior density of fi given in (b). The p-value p{fJ,) 
from (a) is equal to the survivor value s{fj,) in (b). 

For a second example consider the model y = 9 + z, 
where z has the standard extreme value distribution 
with density g(z) = e~^exp{— e~^} and distribution 
function G{z) = exp(— e~^). The p-value from data 
is 

p{9)= / g{y-9)dy = G{y'^-e) 

= exp{-e-(^'-^)}, 

which records the probability position of the data 
in the 9 distribution; it can be viewed as a right 
tail distribution function for confidence. The pos- 
terior distribution for 9 using the Bayes invariant 
prior has density g{y^ — 9) and can be described as 
a reversed extreme value distribution centered at y^ . 
The posterior survivor function is 

s{9)= / 5(y0-a)da = exp{-e-(^°-^)}, 
Je 

and again agrees with the p-value p{9); see Figure 2. 

Of course, in general for a location model /(y — 9) 
as examined by Bayes, we have 

fy° f-y^-e 
P{0)= f{y-9)dy= f{z)dz 

POO 

= f{y^-a)da = s{9), 
Je 

and, thus, the Bayes posterior distribution is equal 
to the confidence distribution. Or, more directly, the 
Bayes posterior distribution is just standard confi- 
dence. 
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Fig. 2. The extreme value EV(S, 1) model: the density of y 
given 6 in (a); the posterior density of 9 given j/" in (b). The 
p-value p{6) from (a) is equal to the survivor value s{9) in (b). 

Lindley (1958) presented this result and under 
suitable change of variable and parameter showed 
more: that the p-value and s-value are equal if and 
only if the model f{y; 9) is a location model f{y — 9). 
In his perspective then, this argued that the confi- 
dence approach was flawed, confidence as obtained 
by inverting the p-value function as a pivot. From 
a different perspective, however, it argues equally 
that the Bayes approach is flawed, and does not have 
the support of the confidence interpretation unless 
the model is location. 
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Lindley objected also to the term probability be- 
ing attached to the original Fisher word for con- 
fidence, viewing probability as appropriate only in 
reference to the conditional type calculations used 
by Bayes. By contrast, repetition properties for con- 
fidence had been clarified by Neyman (1937). As 
a consequence, in the discipline of statistics, the 
terms probability and distribution were then typi- 
cally not used in the confidence context, but were in 
the Bayes context. The repetition properties, how- 
ever, do not extend to the Bayes calculation ex- 
cept for simple location cases, as we will see; but 
they do extend for the confidence inversion. We take 
this as strong argument that the term probability 
is less appropriate in the Bayesian weighted likeli- 
hood context than in the confidence inversion con- 
text. 

The location model, however, is extremely spe- 
cial in that the parameter has a fundamental lin- 
earity and this linearity is expressed in the use of 
the flat prior with respect to the location param- 
eter. Many extensions of the Bayes mathematical 
prior have been proposed trying to achieve the fa- 
vorable behavior of the original Bayes, for example, 
Jeffreys (1939, 1946) and Bernardo (1979). We refer 
to such priors as default priors, priors to elicit infor- 
mation from an observed likelihood function. And 
we will show that if the parameter departs from 
a basic linearity, then the Bayes posterior can be 
seriously misleading. Specifically, we will show that 
with moderate departures from linearity the Bayes 
calculation can give an acceptable approximation to 
confidence, but that with more extreme departure 
from linearity or with large parameter dimension it 
can give unacceptable approximations. 

John Tukey actively promoted a wealth of simple 
statistical methods as a means to explore data; he 
referred to them as quick and dirty methods. They 
were certainly quick using medians and ranges and 
other easily accessible characteristics of data. And 
they were dirty in the sense of ignoring character- 
istics that in the then currently correct view were 
considered important. We argue that Bayes poste- 
rior calculations can appropriately be called quick 
and dirty, quick and dirty confidence. 

There are also extensions of the Bayes approach 
allowing the prior to reflect the viewpoint or judg- 
ment or prejudice of an investigator; or to reflect 
the elicited considerations of those familiar with the 
context being investigated. Arguments have been 
given that such a viewpoint or consideration can be 



expressed as probability; but the examples that we 
present suggest otherwise. 

There are of course contexts where the true value of 
the parameter has come from a source with a known 
distribution; in such cases the prior is real, it is ob- 
jective, and could reasonably be considered to be 
a part of an enlarged model. Then whether to in- 
clude the prior becomes a modeling issue. Also, in 
particular contexts, there may be legal, ethical or 
moral issues as to whether such outside information 
can be included. If included, the enlarged model is 
a probability model and accordingly is not statisti- 
cal: as such, it has no statistical parameters in the 
technical sense and thus predates Bayes and can be 
viewed as being probability itself not Bayesian. Why 
this would commonly be included in the Bayesian 
domain is not clear; it is not indicated in the orig- 
inal Bayes, although it was an area neglected by 
the frequentist approach. Such a prior describing 
a known source is clearly objective and can properly 
be called an objective prior; this conflicts, however, 
with some recent Bayesian usage where the term ob- 
jective is misapplied and refers to the mathematical 
priors that we are calling default priors. 

In Section 2 we consider the scalar variable scalar 
parameter case and determine the default prior that 
gives posteriors with reliable quantiles; some details 
for the vector parameter case are also discussed. In 
Section 3 we argue that the only satisfactory way 
to assess distributions for unobserved quantities is 
by means of the quantiles of such distributions; this 
provides the basis then for comparing the Bayesian 
and frequentist approaches. 

In Sections 4-6 we explore a succession of exam- 
ples that examine how curvature in the model or 
in the parameter of interest can destroy any confi- 
dence reliability in the default Bayes approach, and 
thus in the Bayesian use of just likelihood to present 
a distribution purporting to describe an unknown 
parameter value. 

In Sections 7 and 8 we discuss the merits of the 
conditional probability formula when used with a ma- 
thematical prior and also the merits of the optimal- 
ity approach; then Section 9 records a brief discus- 
sion and Section 10 a summary. 

2. BUT IF THE MODEL IS NONLINEAR 

With a location model the confidence approach 
gives p{9) and the default Bayes approach gives s{6), 
and these are equal. Now consider things more gen- 
erally and initially examine just a statistical mo- 
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del f{y;0) where both y and 6 are scalar or real 
valued as opposed to vector valued, but without the 
assumed linear relationship just discussed. 

Confidence is obtained from the observed distribu- 
tion function F^{9) and a posterior is obtained from 
the observed density function f^{0). For convenience 
we assume minimum continuity and that F{y; 9) is 
stochastically increasing and attains both and 1 
under variation in y or 0. The confidence p- value is 
directly the observed distribution function, 

pie) = F^{e) = Fiy^;e), 

which can be rewritten mechanically as 

/•oo 

Pi9)= -F;e(y°;a)da; 



the subscript denotes partial differentiation with re- 
spect to the corresponding argument. The default 
Bayes s-value is obtained from likelihood, which is 
the observed density function f{y^;0) = Fy{y'^;6): 



3(9) 



TT{a)Fy{y^; a) da. 



If p{9) and s{9) are in agreement, then the direct 
comparison of the integrals implies that 



This presents ■k{9) as a possibly data-dependent 
prior. Of course, data dependent priors have a long 
but rather infrequent presence, for example. Box 
and Cox (1964), Wasserman (2000) and Fraser et 
al. (2010b). The preceding expression for the prior 
can be rewritten as 

by directly differentiating the quantile function y = 
y{u, 9) for fixed p-value to u = F{y; 9) and taking 
the observed value, or by taking the total differen- 
tial of F{y] 9); Lindley's (1958) result follows by not- 
ing that the differential equation dy/d9 = a{9)/h[y) 
integrates to give a location model. 

Now suppose we go beyond the simple case of the 
scalar model and allow that ?/ is a vector of length n 
and ^ is a vector of length p. In many applications 
n> p; but here we assume that dim y has been re- 
duced to p by conditioning (see, e.g., Fraser, Fraser 
and Staicu, 2010c), and that a smooth pivot z{y,9) 
with density g{z) describes how the parameter af- 
fects the distribution of the variable. The density 



for y is available by inverting from pivot to sample 
space: 

g{z)dz = f{y; 9) dy = g{z{y; 9)}\zy{y; 9)\ dy, 

where the subscript again denotes partial differenti- 
ation. 

For confidence a differential element is obtained 
by inverting from pivot to parameter space: 

giz)dz = g{ziy^;9)}\z,eiy'';9)\d9. 

And for posterior probability the differential element 
is obtained as weighted likelihood 

giz)dz = g{z{y'';9)}\zy{y'-9)\7T{9)d9. 

The confidence and posterior differential elements 
are equal if 



\zy{y';0)\ 



we call this the default prior for the model f{y\9) 
with data As dy/d9 = z-\y°;9)z.e{y°;9) for fi- 
xed z, we will have confidence equal to posterior 
if 7r(6') = \dy/d9\yO, a simple extension of the scalar 
case. The matrix dy/d9\yO can be called the sensi- 
tivity of the parameter at the data point y^ and the 
determinant provides a natural weighting or scal- 
ing function 7r{9) for the parameter; this sensitiv- 
ity is just presenting how parameter change affects 
the model and is recording this just at the relevant 
point, the observed data. 

3. HOW TO EVALUATE A POSTERIOR 
DISTRIBUTION 

(i) Distribution function or quantile function. In 
the scalar parameter case, both p{9) and s{9) have 
the form of a right tail distribution function or sur- 
vivor function. In the Bayesian framework, the func- 
tion s{9) is viewed distribution of posterior 
probability. In the frequentist framework, the func- 
tion p(9) can be viewed as a distribution of confi- 
dence, as introduced by Fisher (1930) but originally 
called fiducial; it has long been a familiar theme, fre- 
quentist or Bayesian, that it is inappropriate to treat 
such a function as a distribution describing possible 
values for the true parameter. 

For a scalar parameter model with data, the Bayes 
and the confidence approaches with data each lead 
to a probability-type evaluation on the parameter 
space; and these can be different as Lindley (1958) 
demonstrated and as we have quantified in the pre- 
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ceding section. Surely then, they both cannot be cor- 
rect. So, how to evaluate such posterior distributions 
for the parameter? 

A probability description is a rather complex thing 
even for a scalar parameter: ascribing a probability- 
type assessment to one-sided intervals, two-sided in- 
tervals, and more general sets. What seems more 
tangible but, indeed, is equivalent is to focus on the 
reverse, the quantiles: choose an amount /3 of prob- 
ability and then determine the corresponding quan- 
tile Op, a value with the alleged probability 1 — /3 
to the left and with /3 to the right. We then have 
that a particular interval (0^,oo) from the data has 
the alleged amount /3. Here we focus on such quan- 
tiles 9 IS on the scale for 9. In particular, we might 
examine the 95% quantile 0o.95; the median quan- 
tile 00.50) the 5% quantile 0o.O5) and others, all as 
part of examining an alleged distribution for 9 ob- 
tained from the data. 

For the Normal(/i, <Tq) example with data y^, the 
confidence approach gives the /3-level quantile 

where ^{zij) = /3 as based on the standard normal 
distribution function ^. In particular, the 95%, 50% 
and 5% quantiles are 

Ao.95 = ?/° - 1-64O-0, /io.50 = y°, 

Ao.05 = y° + 1.64(7o; 

and the corresponding confidence intervals are 

(2/°-1.64ao,oo), (y°,oo), (y° + 1.64ao, oo), 

with the lower confidence bound in each case record- 
ing the corresponding quantile. 

Now more generally suppose we have a model f{y; 
9) and data , and that we want to evaluate a pro- 
posed procedure, Bayes, frequentist or other, that gi- 
ves a probability-type evaluation of where the true 
parameter 9 might be. As just discussed, we can fo- 
cus on some level, say, /3, and then examine the cor- 
responding quantile 9p or the related interval [9^, oo). 
In any particular instance, either the true 9 is in the 
interval {9p,co), or it is not. And yet the procedure 
has put forward a numerical level (3 for the pres- 
ence of 9 in {9(3, oo). What does the asserted level /? 
mean? 

(ii) Evaluating a proposed quantile. The definitive 
evaluation procedure is in the literature: use a Ney- 
man (1937) diagram. The model f{y;9) sits on the 
space 5x0 which here is the real line for S cross 
the real line for Q; this is just the plane R^. For 
any particular y the procedure gives a parameter 
interval (0^(y),cx3); if we then assemble the result- 




FlG. 3. The 97.5% allegation for the Normal confidence pro- 
cedure, on the {y, 6) -space. 

ing intervals, we obtain a region 

= \J{y} X (^(y), oo) = {{y, 9) -.9 in (^(y), oo)} 

on the plane. For the confidence procedure in the 
simple Normal(0,l) case, Figure 3 illustrates the 
97.5% quantile ^0.975 for that confidence procedure; 
the region = ^0,975 is to the upper left of the 
angled line and it represents the /3 = 97.5% allega- 
tion concerning the true 9, as proceeding from the 
confidence procedure. 

Now, more generally for a scalar parameter, we 
suggest that the sets present precisely the essence 
of a posterior procedure: how the procedure presents 
information concerning the unknown 9 value. We 
can certainly examine these for various values of /3 
and thus investigate the merits of any claim implicit 
in the alleged levels /3. 

The level /3 is attached to the claim that 9 is 
in (0^(y),oo), or, equivalently, that {y,9) is in the 
set j4^. In any particular instance, there is of course 
a true value 9, and either it is in {9i3{y), 00} or it 
is not in {9 is{y), 00}. And the true 9 did precede 
the generation of the observed y in full accord with 
the probabilities given by the model. Accordingly, 
a value 9 for the parameter in the model implies an 
actual Proportion of true assertions consequent to 
that 9 value: 

PTopn{Ai3;9) = Fr{A(s includes {y,9);9}. 

This allows us to check what relationship the ac- 
tual Proportion bears to the value /3 asserted by the 
procedure: is it really f3 or is it something else? 
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Of course, there may be contexts where in addi- 
tion we have that the value has been generated 
by some random source described by an available 
prior density '7t{6), and we would be interested in 
the associated Proportion, 

Propn(A^;7r) = ^ Propn(^^; 6')7r(6l) d6l, 

presenting the average relative to the source densi- 
ty 7r{e). 

(iii) Comparing proposed quantiles. For the Bayes 
procedure with the special original linear model 
fiy — 0) we have by the usual calculations that 

Propn{Afs;e) = (3 

for all 9 and /3: the alleged level f3 agrees with the 
actual Proportion of true assertions that are made. 
And, more generally, if the 9 value has been gener- 
ated by a source 7r{9), then it follows that the alleged 
level /3 does agree with the actual Proportion: thus, 
Propn(A^;7r) = /3. 

For the standard confidence procedure in the con- 
text of an arbitrary continuous scalar model f{y; 9), 
we have by the standard calculations that 

Fropn{Ap-9) = Pr{{y,9) in Ap;9} 

^PT{F{y;9)<(3;9}^l3 

for all 9 and f3. Of course, in the special Bayes loca- 
tion model f{y — 9) the Bayes original procedure 
does coincide with the confidence procedure: the 
original Bayes was confidence in disguise. 

Now for some proposed procedure having a re- 
gion Ap with alleged level /3, there is of course the 
possibility that the actual Proportion is less than /5 
for some 9 and is greater than (3 for some other 9 and 
yet when averaged with a particular prior vr(0) gives 
a revised Propn(A^; tt) that does have the value /3; 
the importance or otherwise of this we will discuss 
later. 

But we now ask, what is the actual Proportion 
for a Bayes procedure in nonlocation models? To- 
ward this, we next examine a succession of exam- 
ples where the linearity is absent to varying degrees, 
where the parameter to variable relationship is non- 
linear! 

4. NONLINEARITY AND BOUNDED 
PARAMETER: THE ERRORS ARE 0(1) 

We first examine an extreme form of nonlinear- 
ity, where the range of the parameter is bounded. 
This is a familiar problem in the current High En- 



ergy Physics of particle accelerators and the related 
search and detection of possible new particles: a par- 
ticle count has a Poisson(0) distribution but 9 is 
bounded below by which represents the contri- 
bution from background radiation. For some discus- 
sion see Mandelkern (2002), Reid and Fraser (2003) 
and Fraser, Reid and Wong (2004). 

The critical issues are more easily examined in 
a continuous context. For this, suppose that y is 
Normal(0, (Tq) and that it is known that 9>9q with 
an interest in detecting whether 9 is actually larger 
than let y^ be the observed data point; this con- 
tinuous version was also mentioned in Mandelkern 
(2002), Woodroofe and Wang (2000) and Zhang and 
Woodroofe (2003). For convenience here and with- 
out loss of generality, we take the known ctq = 1 and 
the lower bound = 0- 

From a frequentist viewpoint, there is the likeli- 
hood 

L\9) = c<P{y^-9) 

recording probability at the data, again using ^(z) 
for the standard normal density. And also there is 
the p- value 

p{9) = ^y^-9) 

recording probability left of the data. They each of- 
fer a basic presentation of information concerning 
the parameter value 9; see Figure 4(a) and (b). Also 
note that p{9) does not reach the value 1 at the lower 
limit ^0 for 9; of course, the value is just recording 
the statistical position of the data y^ under possi- 
ble 9 values, so there is no reason to want or expect 
such a limit. 

First consider the confidence approach. The in- 
terval (0,/3) for the p- value function gives the inter- 
val {max(0O)y'^ ~ -2/3), CO } for 9 when we acknowl- 
edge the lower bound, or gives the interval {y^ — 
zp,oo) when we ignore the lower bound. In either 
case the actual Proportion is equal to the alleged 
value /3, regardless of the true value of 9. There 
might perhaps be mild discomfort that if we ignore 
the lower bound and calculate the interval, then 
it can include parameter values that are not part 
of the problem; but nonetheless the alleged level is 
valid. 

Now consider the default Bayes approach. The 
model /(y; 9) = 4>{y^ — 9) is translation invariant for 
9 > 9q, and this would indicate the constant prior 
Tr{9) = c, at least for 9 > 9q. Combining the prior 
and likelihood and norming as usual gives the pos- 
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(b) 



s(e) 



(c) 

Fig. 4. The Norma^e, 1) with 9 > do = 0: (a) the likeli- 
hood function L{9); (b) p-value function p{9) = ^{y° — 9); 
(c) s-value function s{9) = — 9)/${y''). 



terior density 



>o, 



and then gives the posterior survivor value 

See Figure 4(c). The /3-quantile of this truncated 
normal distribution for 9 is obtained by setting 
s{6) = (3 and solving for 0: 



O Lf) 




ip = y'' 



e 



Fig. 5. Normal with bounded mean: the actual Proportion 
for the claimed level P = 50% is strictly less than the claimed 
50%. 



where again designates the standard normal 7- 
quantile. 

We are now in a position to calculate the actual 
Proportion, namely, the proportion of cases where 
it is true that 9 is in the quantile interval, or, equiv- 
alently, the proportion of cases where {9 13, 00) in- 
cludes the true 9 value: 

Propn(^) = Fr{y - zp^^y) kO-.O} 
= Pr{z<2:/3$(e+2)}, 

where z is taken as being Normal(0, 1); this expres- 
sion can be written as in integral (/>(z) dz with 
S = {z: ^{z) < P^(9 + z)} and can routinely be eval- 
uated numerically for particular values of 9 and /S. In 
particular, for 9 at the lower limit 9 = 9q = the cov- 
erage set S becomes S = {z -.^(z) < /3^{z)}, which is 
clearly the empty set unless /3 = 1. In particular, at 
the lower limit 9 = 9^ = the Propn(0o) has the phe- 
nomenal value zero, Propn(^o) = 0) which is a conse- 
quence of the empty set just mentioned; certainly an 
unusual performance property for a claimed Bayes 
coverage of, say, /3 when, as typical, /3 is not zero. 

In Figure 5 we plot this Proportion against /3 for 
the /3 = 50% quantile; and we note that it is uni- 
formly less than the nominal, the claimed 50%. In 
particular, at the lower limit ^ = = the 
Propn(0o) = has the phenomenal value zero, as 
mentioned in the preceding paragraph; certainly an 
unusual performance property for a claimed Bayes 
coverage of /3 = 50%! Then in Figure 6 we plot the 
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Fig. 6. Normal with bounded mean: the actual Proportions 
for the claimed level /3 = 90% and (5 — 10% are strictly less 
than the claimed. 

proportion for /? = 90% and for /3 = 10%; again we 
note that the actual Proportion is uniformly less 
than the claimed value, and again Propn(^) has the 
extraordinary coverage value when the parameter 
is at the lower bound 0. Of course, the departure 
would be in the other direction in the case of an 
upper bound. 

In summary, in a context with a bound on the 
parameter, the performance error with the Bayes 
calculation can be of asymptotic order 0(1). 

5. NONLINEARITY AND PARAMETER 
CURVATURE: THE ERRORS ARE 0{n-^/^) 

A bound on a parameter as just discussed is a ra- 
ther extreme form of nonlinearity. Now consider a ve- 
ry direct and common form of curvature. Let (2/1,2/2) 
be Normal(0; /) on and consider the quadratic in- 
terest parameter [Of + 62), or the equivalent p{6) = 

1 /2 

{O1+62) which has the same dimensional units as 
the Of, and let = (2/1,2/2) be the observed data. For 
asymptotic analysis we would view the present vari- 
ables as being derived from some antecedent sample 
of size n and they would then have the Normal(0, 1/ 
n) distribution. 

From the frequentist view there is an observable 
variable r = {yl + 2/2)^^^ that in some pure physical 
sense measures the parameter p. It has a noncentral 
chi distribution with noncentrality p and degrees of 
freedom 2. For convenience we let X2{p) designate 
such a variable with distribution function H2{x,p), 
which is typically available in computer packages; 



and its square can be expressed as xi = (-^i + p)^ + -^l 
in terms of standard normal variables and it has the 
noncentral chi-square distribution with 2 degrees of 
freedom and noncentrality usually described by p^. 
The distribution of r is free of the nuisance param- 
eter which can conveniently be taken as the polar 
angle a = arctan(02/^i)- The resulting value func- 
tion for p is 

(1) p{p)=FT{x2{p)<r'^} = H2{r';p). 

See Figure 7(a), where for illustration we examined 
the behavior for 9 = + 1. 

From the frequentist view there is the directly 
measured p-value p{p) with a Uniform(0, 1) distri- 
bution, and any /3 level lower confidence quantile is 
available immediately by solving /3 = H2{r^; p) for p 
in terms of . 

From the Bayes view there is a uniform prior 
7r(0) = c as directly indicated by Bayes (1763) for 
a location model on the plane B?. The correspond- 
ing posterior distribution for 9 is then Normal(2/'^; /) 
on the plane. And the resulting marginal posterior 
for p is described by the generic variable X2{i'^)- As r 
is stochastically increasing in p, we have that the 
Bayes analog of the p- value is the posterior survivor 
value obtained by an upper tail integration 

(2) s{p) = Pr{x2(r°) >p} = l- H^ip; r^). 

The Bayes s{p) and the frequentist p{p) are actu- 
ally quite different, a direct consequence of the obvi- 
ous curvature in the parameter p = {9f + ^2)^^^- The 
presence of the difference is easily assessed visually 
in Figure 7 by noting that in either case there is a ro- 
tationally symmetric normal distribution with unit 
standard deviation which is at the distance d = 1 
from the curved boundary used for the probabil- 
ity calculations, but the curved boundary is cupped 
away from the Normal distribution in the frequentist 
case and is cupped toward the Normal distribution 
in the Bayes case; this difference is the direct source 
of the Bayes error. 

From (1) and (2) we can evaluate the posterior 
error s(p) —p{p) = 1 — H2{p; r^) — H2{r^; p) which is 
plotted against p in Figure 8 for = 5. This Bayes 
error here is always greater than zero. This happens 
widely with a parameter that has curvature, with 
the error in one or other direction depending on the 
curvature being positive or negative relative to in- 
creasing values of the parameter. Some aspects of 
this discrepancy are discussed in David, Stone and 
Zidek (1973) as a marginalization paradox. 
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Now in more detail for this example, consider the /3 
lower quantile pp of the Bayes posterior distribution 
for the interest parameter p. This (3 quantile for the 
parameter p is obtained from the X2{i'^) posterior 
distribution for p giving 

Pl3 = Xi-p{r^), 

where we now use X'yi''') for the 7 quantile of the 
noncentral chi variable with 2 degrees of freedom 
and noncentrality r, that is, H2{X'y]^) =7- We are 
now in a position to evaluate the Bayes posterior 
proposal for p. For this let Propn(A^;0) be the pro- 
portion of true assertions that p is in Ai3 = {pi3{r), 00}; 
we have 

Propn(^/3 ; p) = Pr {p in (r ) , 00) ; p} 
= Pr{p^(r) <p;p} 



= Pr{xi-/3(r) <p;p}, 

where the quantile p^(r) is seen to be the (1 — /3) 
point of a noncentral chi variable with degrees of 
freedom 2 and noncentrality r, and the noncentral- 
ity r has a noncentral chi distribution with noncen- 
trality p. The actual Proportion under a parameter 
value p can thus be presented as 

Propn(A^; p) = Pr[xi-/3{x2(p)} < P; p] 

= Fr[l-P<H2{p;x2ip)}], 

which is available by numerical integration on the 
real line for any chosen /3 value. 

We plot the actual Propn(A5o%; p) against p in 
Figure 9 and note that it is always less than the al- 
leged 50%. We then plot the Proportion for /? = 90% 
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Fig. 8. The Bayes error s{p) — p{p) from the N{9,I) model with data — (5,0). 
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Fig. 9. Proportion with claimed level (3 — 50%. 

and for /3 = 10% in Figure 10 against p, and note 
again that the plots are always less than the claimed 
values 95% and 5%. This happens generally for all 
possible quantile levels /?, that the actual Propor- 
tion is less than the alleged probability. It happens 
for any chosen value for the parameter; and it hap- 
pens for any prior average of such 6 values. If by 
contrast the center of curvature is to the right, then 
the actual Proportion is reversed and is larger than 
the alleged. 

In summary, in the vector parameter context with 
a curved interest parameter the performance error 
with the Bayes calculation can be of asymptotic or- 
der 0(n-i/2). 



Q. 

C 
Q. 
O 




5 10 15 20 



P 

Fig. 10. Proportion for claimed /3 — 90% and for claimed 
P — 10%; strictly less than the claimed. 

6. NONLINEARITY AND MODEL 
CURVATURE: THE ERRORS ARE 0{n-^) 

(i) The model and confidence bound. Taylor series 
expansions provide a powerful means for examining 
the large sample form of a statistical model (see, 
e.g., Abebe et al., 1995; Andrews, Fraser and Wong, 
2005; Cakmak et al., 1998). From such expansions 
we find that an asymptotic model to second order 
can be expressed location model and to third 
order can be expressed as a location model with an 
0{n~^) adjustment that describes curvature. 

Examples arise frequently in the vector parame- 
ter context. But for the scalar parameter context 
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: y - 1.96 



Fig. 11. The 97.5% confidence quantile 9'^ (y) = y — 
1.96{1 + 7(y - 1.96)V4n}. The 97.5% likelihood quantile 
e^{y) = (1 + ^)\y - 1.96{1 + 7(y - 1.96)V4n}] is a verti- 
cal rescaling about the origin; the 97.5% Bayes quantile 0^{y) 
with prior exp{a/n + cd /n} is a vertical rescaling plus a lift 
a/n and a tilt cy/n. Can this prior lead to a confidence pre- 
sentation? No, unless the prior depends on the data or on the 
level /3. 

the common familiar models are location or scale 
models and thus without the curvature of interest 
here. A simple example with curvature, however, 
is the gamma distribution model: f{y; 6) = 
r-i(^)/-^exp{-y}. 

To illustrate the moderate curvature, we will take 
a very simple example where y is Normal{0, (T'^(0)} 
and (7^(0) depends just weakly on the mean 9, and 
then in asymptotic standardized form we would have 

^2(6*) = 1 + 761 V2n 

in moderate deviations. The /3-level quantile for this 
normal variable y is 



yp{e) = e + a[ 

(3) =e + zp{l + -id^/2nf/'^ 

= e + zp{l+-ie'^/4.n)+0{n-^'^). 

The confidence bound 9p with /? confidence above 
can be obtained from the usual Fisher inversion of 
y = 9 -\- Zj3{\ + 7^^/4n): we obtain 

9 = y- zii{l + 7^2/4^) + 0(n"3/2) 

= y- zp{l + 7(y - zpf/An} + 0(n-3/2). 

Thus, the j3 level lower confidence quantile to order 
0(n-3/2) -g 

(4) 9^{y)=y-zp{l+^{y-zpf/An}, 



where we add the label C for confidence to distin- 
guish it from other bounds soon to be calculated. 
See Figure 11. 

(ii) From confidence to likelihood. We are inter- 
ested in examining posterior quantiles for the ad- 
justed normal model and in this section work from 
the confidence quantile to the likelihood quantile, 
that is, to the posterior quantile with flat prior 
tt{9) = 1; this route seems computationally easier 
than directly calculating a likelihood integral. 

From Section 3 and formula (3) above, we have 
that the prior t^{9) that converts a likelihood f^{9) = 
L{9; yO) = Fy{y°; 9) to confidence = -F.e{y^;9) 



IS 



dy 
d9 



= l+-fz9/2n\yO 

= 1 + 7(y° - 9)9 /2n + 0{n-^/^) 

= exp{7(y° - 9)9 /2n} + Oin'^/^). 

Then to convert in the reverse direction, from confi- 
dence f^{9) to likelihood f^{9), we need the inverse 
weight function 



(5) 



w{9) = exp{-f9{9 - y^)/2n}. 



Interestingly, this function is equal to 1 at = and 
at y^, and is less than 1 between these points when 
7>0. 

(iii) From confidence quantile to likelihood quan- 
tile. The weight function (5) that converts confi- 
dence to likelihood has the form exp{o0/n^/^ + c9'^/ 
2n} with a = — 7y°/2n^/^ and = 7. The effect of 
such a tilt and bending is recorded in the Appendix. 
The confidence quantile 9^ given at (4) is a 1 — /3 

quantile of the confidence distribution. Then using 
formula (10) in the Appendix, we obtain the for- 
mula for converting confidence quantile to likelihood 
quantile: 



(6) 



1 + 



1 + 



2_ 

2n 

_L 

2n 



■ 7y°/2n + 7?/°/2n 



Thus, the likelihood distribution is obtained from 
the confidence distribution by a simple scale factor 
1 + 7/2n; this directly records the consequence of 
the curvature added to the simple normal model by 
having (t^(0) depend weakly on 9. 

(iv) From likelihood quantile to posterior quan- 
tile. Now consider a prior applied to the likelihood 
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Fig. 12. P-level quantiles. The difference 0^{y) — d'' iy) *s the vertical separation above y between quantile curves. The 
difference y^ (0) ~ y^ iO) is the horizontal separation between curves as a function of 6. 



distribution. A prior can be expanded in terms of 
standardized coordinates and takes the form tt{0) = 
exp(a0/n^/^ + c0^/2n). The effect on quantiles is 
available from the Appendix and we see that a prior 
with tilt coefficient a/v}/"^ would excessively dis- 
place the quantile and thus would give posterior 
quantiles with bad behaving Propn(0) in repetitions; 
accordingly, as a possible prior adjustment, we con- 
sider a tilt with just a coefficient a/n. We then ex- 
amine the prior iriO) = exp(a0/n + cO"^ /2n). First, 
we obtain the Bayes quantile in terms of the likeli- 
hood quantile as 



a cy 

H h — ; 

2n I n 2n ' 



and then substituting for the likelihood quantile in 
terms of the confidence quantile (6) gives 



(7) 



2n 



a cy 
+ - + —. 
n 2n 



For 0^{y) in (4) to be equal to 0^{y) in (7) we would 
need to have c = —7 and then a = 7y/2. But this 
would give a data dependent prior. We noted the 
need for data dependent priors in Section 3, but we 
now have an explicit expression for the effect of pri- 
ors on quantiles. 



Now consider the difference in quantiles: 



(y)-^^(y) 



3C 



7 + c 
2n 



a cy 

H h — 

n 2n 



7 + c a cy 

a 7 + 2c 7 + c 
n ^ 2n ^ 



2n 



where we have replaced 9^ by y — Zjs, to order 
0(n~^/^); Figure 12 shows this difference as the 
vertical separation above a data value y. From the 
third expression above we see that in the presence of 
model curvature 7 the Bayesian quantile can achieve 
the quality of confidence only if the prior is data de- 
pendent or dependent on the level /3. 

Similarly, we can calculate the horizontal separa- 
tion corresponding to a value, and obtain 



y(^{9)-y^{9) 



(8) 



2n n 2n ^ 

}^ + ^ + f{29 + z^). 
2n n In 



This gives the quantile difference, the confidence 
quantile less the Bayes quantile, as a function of 9; 
see Figure 12, and observe the horizontal separation 
to the right of a parameter value 9. 
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Fig. 13. The actual Proportion with claimed level /3 = 50%. 

A Bayes quantile can not generate true statements 
concerning a parameter with the reliabihty of confi- 
dence unless the model curvature is zero, that is, un- 
less the model is of the special location form where 
Bayes coincides confidence. The Bayes approach can 
thus be viewed as having a long history of misdirec- 
tion. 

Now let 9 designate the true value of the param- 
eter 9, and suppose we examine the performance of 
the Bayesian and frequentist posterior quantiles. In 
repetitions the actual proportion of instances where 
y < y^{9) is of course /3. The actual proportion of 
cases with y <y^{9) is then 

Propn(0) = /3 - + ^ + ^(2^ + z,)}</>(.^), 

where for the terms of order 0{n^^) it suffices to use 
the N{9, 1) distribution for y. The Bayes calculation 
claims the level /?. The choice a = 0, c = gives a flat 
prior in the neighborhood of ^ = which is the cen- 
tral point of the model curvature. With such a choice 
the actual Proportion from the Bayes approach is 
deficient by the amount 9'^(j){zp) /2n. For a claimed 
13 = 50% quantile see Figure 13 for the actual Pro- 
portion and for a claimed (3 = 90% or /3 = 10% see 
Figure 14. Thus, the (3 quantile by Bayes is consis- 
tently below the claimed level (3 for positive values 
of 9, and consistently above the claimed level for 
negative values of 9. 

In summary, even in the scalar parameter context, 
an elementary departure from simple linearity can 
lead to a performance error for the Bayes calculation 
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e 

Fig. 14. The actual Proportion with claimed levels /3 = 90% 
and /3 = 10%. 

of asymptotic order 0(n^^). And, moreover, it is 
impossible by the Bayes method to duplicate the 
standard confidence bounds: a stunning revelation! 

7. THE PARADIGM 

The Bayes proposal makes critical use of the con- 
ditional probability formula /(yi I2/2) = c/ (2/1, 2/2)- 
typical applications the formula has variables yi 
and y2 in a temporal order: the value of the first yi 
is inaccessible and the value of the second 7/2 is ob- 
served with value, say, 2/2- Of course, the value of 
the first yi has been realized, say, y\^ but is con- 
cealed and is unknown. Indeed, the view has been 
expressed that the only probabilities possible con- 
cerning such an unknown y\ are the values or 1 and 
we don't know how they would apply to that y\. We 
thus have the situation where there is an unknown 
constant yj, a constant that arose antecedent in 
time to the observed value 1/2 > ^'^'^ '^^ want to make 
probability statements concerning that unknown an- 
tecedent constant. As part of the temporal order 
we also have that the joint density became avail- 
able in the order f{yi) for the first variable fol- 
lowed by f{y2\yi) for the second; thus, /(yi,y2) — 

f{vi)f{vl\yi)- 

The conditional probability formula itself is very 
much part of the theory and practice of probability 
and statistics and is not in question. Of course, limit 
operations are needed when the condition 7/2 = 2/2 
has probability zero leading to a conditional prob- 
ability expression with a zero in the denominator, 
but this is largely technical. 
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A salient concern seemingly centers on how prob- 
abilities can reasonably be attached to a constant 
that is concealed from view? The clear answer is in 
terms of what might have occurred given the same 
observational information: the corresponding picture 
is of many repetitions from the joint distribution 
giving pairs (2/1,2/2); followed by selection of pairs 
that have exact or approximate agreement 2/2 = 2/21 
and then followed by examining the pattern in the 2/1 
values among the selected pairs. The pattern records 
what would have occurred for 2/1 among cases where 
2/2 = 2/21 probabilities arise both from the density 
/(2/1) and from the density /(2/2|2/i)- Thus, the ini- 
tial pattern /(2/1) when restricted to instances where 
2/2 = 2/2 becomes modified to the pattern /(2/1I2/2) = 

c/(2/i,2/2°)=c/(2/i)/(2/2°|yi). 

Bayes (1763) promoted this conditional probabil- 
ity formula and its interpretation, for statistical con- 
texts that had no preceding distribution for 6 and 
he did so by introducing the mathematical prior. 
He did provide, however, a motivating analogy and 
the analogy did have something extra, an objective 
and real distribution for the parameter, one with 
probabilities that were well defined by translational 
invar iance. Such a use of analogy in science is nor- 
mally viewed as wrong, but the needs for productive 
methodology were high at that time. 

If 7r{9) is treated as being real and descriptive of 
how the value of the parameter arose in the applica- 
tion, it would follow that the preceding conditional 
probability analysis would give the conditional de- 
scription 

7r(0|2/°)=C7r(0)/(2/°;e) 
= ct(^)L°(0). 

The interpretation for this would be as follows: In 
many repetitions from it{9), if each 9 value was fol- 
lowed by a 2/ from the model f{y;9), and if the in- 
stances {9, y) where y is close to y^ are selected, then 
the pattern for the corresponding 9 values would 
be c'k{9)L^{9). In other words, the initial relative 
frequency 7r{9) for 9 values is modulated by L^{9) 
when we select using y = y^; this gives the modu- 
lated frequency pattern ctt{9)LP{9). The conditional 
probability formula as used in this context is often 
referred to as the Bayes formula or Bayes theorem, 
but probability formula it long predates Bayes 
and is generic; for the present extended usage it is 
also referred to as the Bayes paradigm (Bernardo 
and Smith, 1994). 



The Bayes' example as discussed in Sections 2 
and 3 examined a location model f{y — 9) and the 
only prior that could represent location invariance 
is the constant or flat prior in the location parame- 
terization, that is, 7r{9) = c. This of course does not 
satisfy the probability axioms, as the total probabil- 
ity would be OG. The step, however, from just a set 
of 9 values with related model invariance to a distri- 
bution for 9 has had the large effect of emphasizing 
likelihood L^{9), as defined by Fisher (1935). And 
it has also had the effect, perhaps unwarranted, of 
suggesting that the mathematical posterior distri- 
bution obtained from the paradigm could be treated 
as a distribution of real probability. If the parame- 
ter to variable relationship is linear, then Section 3 
shows that the calculated values have the confidence 
(Fisher, 1935; Neyman, 1937) interpretation. But 
if the relationship is nonlinear, then the calculated 
numbers can seriously fail to have that confidence 
property, as determined in Sections 4-6; and indeed 
fail to have anything with behavior resembling prob- 
ability. The mathematical priors, the invariant pri- 
ors and other generalizations are often referred to in 
the current Bayesian literature as objective priors, 
a term that is strongly misleading. 

In other contexts, however, there may be a real 
source for the parameter 9, sources with a known 
distribution, and thus fully entitled to the term ob- 
jective prior; of course, such examples do not need 
the Bayes approach, they are immediately analyz- 
able by probability calculus. And, thus, to use ob- 
jective to also refer to the mathematical priors is 
confusing. 

In short, the paradigm does not produce proba- 
bilities from no probabilities. And if the required 
linearity for confidence is only approximate, then 
the confidence interpretation can correspondingly be 
just approximate. And in other cases even the con- 
fidence interpretation can be substantially unavail- 
able. Thus, to claim probability when even confi- 
dence is not applicable does seem to be fully con- 
trary to having acceptable meaning in the language 
of the discipline. 

8. OPTIMALITY 

Optimality is often cited as support for the Bayes 
approach: If we have a criterion of interest that pro- 
vides an assessment of a statistical procedure, then 
optimality under the criterion is available using a pro- 
cedure that is optimal under some prior average of 
the model. In other words, if you want optimality. 



16 



D. A. S. ERASER 



it suffices to look for a procedure that is optimal for 
the prior-average version of the model. Thus, restrict 
one's attention to Bayes solutions and just find an 
appropriate prior to work from. It sounds persuasive 
and it is important. 

Of course, a criterion as mentioned is just a nu- 
merical evaluation and optimality under one such 
criterion may not give optimality under some other 
criterion; so the choice of the criterion can be a ma- 
jor concern for the approach. For example, would we 
want to use the length of a posterior interval as the 
criterion or say the squared length of the interval or 
some other evaluation; it makes a difference because 
the optimality has to do with an average of values 
for the criterion and this can change with change in 
the criterion. 

The optimality approach can lead to interesting 
results but can also lead to strange trade-offs; see, 
for example. Cox (1958) and Praser and McDun- 
nough (1980). For if the model splits with known 
probabilities into two or several components, then 
the optimality can create trade-offs between these; 
for example, if data sometimes is high precision and 
sometimes low precision and the probabilities for 
this are available, then the search for an optimum 
mean-length confidence interval at some chosen level 
can give longer intervals in the high precision cases 
and shorter intervals in the low precision cases as 
a trade-off toward optimality and toward intervals 
that are shorter on average. It does sound strange 
but the substance of this phenomenon is internal to 
almost all model-data contexts. 

Even with a sensible criterion, however, and with- 
out the compound modeling and trade-offs just men- 
tioned, there are serious difficulties for the optimal- 
ity support for the Bayes approach. Consider further 
the example in Section 6 with a location Normal 
variable where the variance depends weakly on the 
mean: y is Normal{6', cr^(6')} with a'^{9) = l+'-^O'^ /2n 
and where we want a bound Op{y) for the parame- 
ter 9 with reliability /3 for the assertion that is 
larger than Oply). 

From confidence theory we have immediately (4) 
that 

o{y) = o^iy) = y- ^/^{i + i{y - ^z?) V4n} 

with 0(n~^/^) accuracy in moderate deviations. 
What is available from the Bayes approach? A prior 
tt{0) = exp{a0/n^/^ -|- c9^/2n} gives the posterior 
bound 

e^iy) = Fiy){l + c/2n} + - + ^. 



The actual Proportion for the f3 level confidence 
bound is exactly /3. The actual Proportion, however, 
for the Bayes bound as derived (8) is 

and there is no choice for the prior, no choice for a 
and c, that will make the actual equal to the nominal 
unless the model has nonzero curvature 7. 

We thus have that a choice of prior to weight the 
likelihood function can not produce a /? level bound. 
But a /3 level bound is available immediately and 
routinely from confidence methods, which does use 
more than just the observed likelihood function. 

Of course, in the pure location case the Bayes ap- 
proach is linear and gives confidence. If there is non- 
linearity, then the Bayes procedure can be seriously 
inaccurate. 

9. DISCUSSION 

Bayes (1763) introduced the observed likelihood 
function to general statistical usage. He also intro- 
duced the confidence distribution when the appli- 
cation was to the special case of a location model; 
the more general development (Fisher, 1930) came 
much later and the present name confidence was 
provided by Neyman (1937). Lindley (1958) then 
observed that the Bayes derivation and the Fisher 
(1930) derivation coincided only for location mod- 
els; this prompted continuing discord as to the mer- 
its and validity of the two procedures in providing 
a probability-type assessment of an unknown pa- 
rameter value. 

A distribution for a parameter value immediately 
makes available a quantile for that parameter, at 
any percentage level of interest. This means that 
the merits of a procedure for evaluating a parame- 
ter can be assessed by examining whether the quan- 
tile relates to the parameter in anything like the 
asserted rate or level asserted for that quantile. The 
examples in Sections 4-6 demonstrate that depar- 
ture from linearity in the relation between param- 
eter and variable can seriously affect the ability of 
likelihood alone to provide reliable quantiles for the 
parameter of interest. 

There is of course the question as to where the 
prior comes from and what is its validity? The prior 
could be just a device as with Bayes original pro- 
posal, to use the likelihood function directly to pro- 
vide inference statements concerning the parameter. 
This has been our primary focus and such priors can 
reasonably be called default priors. 
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And then there is the other extreme where the 
prior describes the statistical source of the exper- 
imental unit or more directly the parameter value 
being considered. We have argued that these priors 
should be called objective and then whether to use 
them to perform the statistical analysis is a reason- 
able question. 

Between these two extremes are many variations 
such as subjective priors that describe the personal 
views of an investigator and elicited priors that rep- 
resent some blend of the background views of those 
close to a current investigation. Should such views 
be kept separate to be examined in parallel with ob- 
jective views coming directly from the statistical in- 
vestigation itself or should they be blended into the 
computational procedure applied to the likelihood 
function alone? There would seem to be strong ar- 
guments for keeping such information separate from 
the analysis of the model with data; any user could 
then combine the two as deemed appropriate in any 
subsequent usage of the information. 

Linearity of parameters and its role in the Bayesian 
frequentist divergence is discussed in Fraser, Fraser 
and Fraser (2010a). Higher order likelihood meth- 
ods for Bayesian and frequentist inference were sur- 
veyed in Bedard, Fraser and Wong (2007), and an 
original intent there was to include a comparison of 
the Bayesian and frequentist results. This, however, 
was not feasible, as the example used there for il- 
lustration was of the nice invariant type with the 
associated theoretical equality of common Bayesian 
and frequentist probabilities; thus, the anomalies 
discussed in this paper were not overtly available 
there. 

10. SUMMARY 

A probability formula was used by Bayes (1763) 
to combine a mathematical prior with a model plus 
data; it gave just a mathematical posterior, with 
no consequent objective properties. An analogy pro- 
vided by Bayes did have a real and descriptive prior, 
but it was not part of the problem actually being ex- 
amined. 

A familiar Bayes example uses a special model, a 
location model; and the resulting intervals have at- 
tractive properties, as viewed by many in statistics. 

Fisher (1935) and Neyman (1937) defined confi- 
dence. And the Bayes intervals in the location model 
case are seen to satisfy the confidence derivation, 
thus providing an explanation for the attractive prop- 
erties. 



The only source of variation available to support 
a Bayes posterior probability calculation is that pro- 
vided by the model, which is what confidence uses. 

Lindley (1958) examined the probability formula 
argument and the confidence argument and found 
that they generated the same result only in the Bayes 
location model case; he then judged the confidence 
argument to be wrong. 

If the model, however, is not location and, thus, 
the variable is not linear with respect to the pa- 
rameter, then a Bayes interval can produce correct 
answers at a rate quite different from that claimed 
by the Bayes probability calculation; thus, the Bayes 
posterior may be an unreliable presentation, an un- 
reliable approximation to confidence, and can thus 
be judged as wrong. 

The failure to make true assertions with a promised 
reliability can be extreme with the Bayes use of 
mathematical priors (Stainforth et al., 2007; Hein- 
rich, 2006). 

The claim of a probability status for a statement 
that can fail to be approximate confidence is mis- 
representation. In other areas of science such false 
claims would be treated seriously. 

Using weighted likelihood, however, can be a fruit- 
ful way to explore the information available from 
just a likelihood function. But the failure to have 
even a confidence interpretation deserves more than 
just gentle caution. 

A personal or a subjective or an elicited prior may 
record useful background to be recorded in parallel 
with a confidence assessment. But to use them to 
do the analysis and just get approximate or biased 
confidence seems to overextend the excitement of 
exploratory procedures. 

APPENDIX 

Tilting, Bending and Quantiles 

Consider a variable y that has a Normal(0; 1) dis- 
tribution and suppose that its density is subject to 
an exponential tilt and bending as described by the 
modulating factor exp{ay + cy^/2}. It follows easily 
by completing the square in the exponent that the 
new variable, say, y, is also normal but with mean 
{9 + a)/{l — c) and variance 1/(1 — c). In particular, 
we can write 

6 + a N-i/2 

y = -, + 1-c ^'^z, 

1 — c 

where z is standard normal. And if we let zp be the /3 
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quantile of the standard normal with /3 = $(z^), 
then the (3 quantile of y is 

9 + a \-i/2 

y/3 = + (1 - ' zp. 

Thus, with the Normal(^, 1) we have that tilting and 
bending just produce a location scale adjustment to 
the initial variable. 

Now suppose that y = 9 + z is Normal(0; 1) to 
third order, and suppose further that its density re- 
ceives an exponential tilting and bending described 
by the factor exp{ay/n^/^ + c?/^/2n}. Then from the 
preceding we have that the new variable can be ex- 
pressed in terms of preceding variables as 

y=— — —. + (l-c/n) i/^z 

1 — c/n 

(9) =e{l + c/n)+a/n^^'^ + {l+c/2n)z 

= y{l + c/2n) + + ec/2n, 

where succeeding lines use adjustments that are 
0(n~^/^). The second line on the right gives quan- 
tiles in terms of the standard normal and the third 
line gives quantiles in terms of the initial variable y. 

One application for this arises with posterior dis- 
tributions. Suppose that 9 = y^ + z is Normal(y'^, 1) 
to third order and that its density receives a tilt 
and bending described by exp(a^/n^/^ + c9'^ /2n). 
We then have from (9) that the modified variable 
can be expressed as 

e = + c/n) + a/n^/'^ + (1 + c/2n)z 

(10) 

= 9{l + c/2n) + + 2/°c/2n, 

to order 0(n~^/^). 
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